require(tidyverse)
Loading required package: tidyverse
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5 ✔ purrr 0.3.4
✔ tibble 3.1.6 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Warning message:
replacing previous import ‘flowViz::contour’ by ‘graphics::contour’ when loading ‘flowStats’
require(flowCore)
Loading required package: flowCore
require(flowClust)
Loading required package: flowClust
Attaching package: ‘flowClust’
The following object is masked from ‘package:graphics’:
box
The following object is masked from ‘package:base’:
Map
require(openCyto)
Loading required package: openCyto
require(ggcyto)
Loading required package: ggcyto
Loading required package: ncdfFlow
Loading required package: RcppArmadillo
Loading required package: BH
Loading required package: flowWorkspace
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace")
require(cowplot)
Loading required package: cowplot
import the data from the RDSS (just once) and then write it to the local disk
Simplify the sample names
# import the longest substring function from the PTXQC package (https://rdrr.io/cran/PTXQC/man/LCSn.html)
source("../../script/20220326-simplify-names-subroutine.R")
oriNames <- sampleNames(fs)
shortNames <- simplifyNames(oriNames) %>%
gsub(".fcs","",.)
sampleNames(fs) <- shortNames
Metadata
sample <- read.csv("20220512-sample-list.csv") %>%
column_to_rownames(var = "file")
pData(fs) <- sample
Error in `phenoData<-`(`*tmp*`, value = pd) :
The sample names no longer match.
Next we use a series of plots to guide our gating strategy for identifying the population we want to work with. ### Remove outliers We first gate on FSC.H and SSC.H to remove outliers (events that are too big or too small). The Attune instrument we use can record six decades (100-106), with the first two decades mostly occupied by electronic noise.
Let’s first define a gate and visualize it in a plot before adding it to a GatingSet.
test <- shortNames[c(3, 12*2+3, 86, 12*7+3)]
outlier.gate <- rectangleGate(filterId = "-outlier", "FSC.H" = c(1.2e5, 1e6), "SSC.H" = c(1e2, 1e6))
ggcyto(fs[test], aes(x = FSC.H, y = SSC.H), subset = "root") +
geom_hex(bins = 64) + geom_gate(outlier.gate) + facet_wrap(~name, ncol = 2)# + ggcyto_par_set(limits = "instrument")
Add this gate to the GatingSet
gs <- GatingSet(fs) # create a GatingSet
gs_pop_add(gs, outlier.gate, parent = "root")
[1] 2
recompute(gs)
done!
Let’s examine how this gate intersected with the FSC.H vs FSC.W plot (for singlets)
p1 <- ggcyto(gs[["188-555-1"]], aes(x = FSC.H, y = FSC.W), subset = "root") + geom_hex(bins = 128)
p2 <- ggcyto(gs[["188-555-1"]], aes(x = FSC.H, y = FSC.W), subset = "-outlier") + geom_hex(bins = 128)
plot_grid(as.ggplot(p1), as.ggplot(p2), ncol = 2)
Next let’s remove multiplets on FSC.H vs FSC.W. To do this, we could
either manually set up a polygon gate, or use the automatic clustering
function provided by the flowClust package. Note that in
the original implementation, the flowClust() function or
the tmixFilter() version that was supposed to allow for
integration with the flowCore package, both were designed
with different downstream actions in mind than what I want to do here
(visualize with ggcyto() + geom_gate()). The
openCyto package written by the same group of authors who
created flowClust and ggcyto has a helper
function to make this possible. See this post for a
discussion on alternative ways to achieve this.
Update switch to a polygon gate as the clustering is not working well. Update 2022/03/26 skip the singlet gate altogether
Add this gate to the gatingSet
When I plotted the singlet events on GFP-RFP 2d space, I noticed a few samples that show more than one population of cells, where the main population appeared to be “induced” while one or more subpopulations are less or not induced. While the biological reasons behind require further investigation and may be very interesting (heterogeneity), for this analysis we will use flowClust to identify the main population and move forward.
ggcyto(gs, aes(x = BL1.H, y = YL2.H), subset = "-outlier") + geom_hex(bins = 64) +
facet_wrap(~name, ncol = 10) + scale_x_logicle() + scale_y_logicle() + theme_bw()
Be careful when working with the GatingSet and GatingHierarchy objects – these are strictly reference classes, meaning that most of the operations work by pointers and the operations will change the underlying data. For example, the first line of the code below (commented out) obtains a pointer to the underlying data rather than making a copy of that data. any operations on it will change the original data as a result.
#ex <- gs_pop_get_data(gs, "singlet")[[1]]
ex <- fs[["194-555-2"]]
#lgcl <- estimateLogicle(ex, channels = c("BL1.H", "YL2.H"))
lgcl <- logicleTransform("induction")
# set cluster gate parameters
k = 1; q = 0.9
# end setting
ex <- transform(ex, lgBL1.H = lgcl(`BL1.H`), lgYL2.H = lgcl(`YL2.H`))
fluo.gate <- gate_flowclust_2d(ex, "lgBL1.H", "lgYL2.H", K = k, quantile = q)
The prior specification has no effect when usePrior=no
Using the serial version of flowClust
ggcyto(ex, aes(x = lgBL1.H, y = lgYL2.H)) + geom_hex(bins = 64) + geom_gate(fluo.gate) + geom_stats()# + scale_x_logicle() + scale_y_logicle()
Even though flowClust is supposed to perform its own transformation
(modified Box-Cox), empirically I found the clustering seem to work
better on logicle transformed data for the two fluorescent channels.
Therefore I’m transforming the underlying data of the GatingSet. Note
that it seems to be difficult to “create new parameters” to store the
transformed data, while keeping the original data intact. Instead, the
transformation functions constructed using the constructor
logicle_trans() stores the inverse transformation
functions, which can be used to perform the inverse transformation when
needed. Followed the manual for GatingSet here
lgcl <- logicle_trans()
transList <- transformerList(c(lgBL1.H = "BL1.H", lgclYL2.H = "YL2.H"), lgcl)
transform(gs, transList)
A GatingSet with 96 samples
to obtain the original data, use
gs_pop_get_data(gs[[1]], inverse.transform = TRUE)
Now we can do the flowClust gating
dat <- gs_pop_get_data(gs, "-outlier") # get parent data
inductionGate <- fsApply(dat, function(fr)
openCyto::gate_flowclust_2d(fr, "BL1.H", "YL2.H", K = k, quantile = q)
)
gs_pop_add(gs, inductionGate, parent = "-outlier", name = "induction")
recompute(gs)
ggcyto(gs, aes(x = BL1.H, y = YL2.H), subset = "-outlier") + geom_hex(bins = 64) + geom_gate("induction") + geom_stats() +
facet_wrap(~name, ncol = 10) + theme_bw()
Notice that 217-373 had very few cells and were repeated on another day.
Because the amount of fluorescence scales with cell size, comparing the median of each sample would only be valid if the cell size distribution is approximately the same across samples. Is that the case?
mult_format <- function() {
function(x) format(x/10000,digits = 2)
}
ggcyto(gs, aes(x = FSC.H), subset = "induction") + geom_density(fill = "forestgreen") +
scale_x_continuous(labels = mult_format(), name = "FSC.H x 10000") +
facet_wrap(~name, ncol = 10, scales = "free_y") +
theme(axis.text.y = element_blank())
Brian Metzger and colleagues proposed a simple correction in their 2015 paper. The intuition behind this method is that FSC is proportional to the max 2d projection (area) of a cell, and thus FSC^(3/2) should be roughly proportional to the volume. By contrast, the fluorescent channels should be the cumulative intensity from the whole cell. Therefore dividing FP intensity by FSC^(3/2) should effectively remove the variation due to cell size difference. He did say that the Wittkopp lab later switched to a different method based on PCA and rotations. After reading those papers, I thought the simpler one will suffice for us.
Test normalization formula
# get population data
fs.out <- gs_pop_get_data(gs, y = "induction", inverse.transform = TRUE) # get the inverse transformed data
# come up with an approximate FSC.H value for an average event to be used a scalar for the next step
mfsc <- 5e5 # based on the mode of the median of FSC.H from all samples
# fs.out is of the cytoframe class, which is a reference class. need to convert to flowframe for transformation
# https://www.bioconductor.org/packages/devel/bioc/vignettes/flowWorkspace/inst/doc/flowWorkspace-Introduction.html
exponent <- 1.5
# ex <- cytoframe_to_flowFrame(fs.out[["A9"]]) %>%
# transform(nFSC = FSC.H/mfsc) %>%
# transform(nGFP = BL1.H/nFSC^(exponent), nRFP = YL2.H/nFSC^(exponent))
ex <- cytoset_to_flowSet(fs.out) %>%
transform(nFSC = FSC.H/mfsc) %>%
transform(nGFP = BL1.H/nFSC^(exponent), nRFP = YL2.H/nFSC^(exponent))
plot the examples
ggplot(ex, aes(x = nFSC, y = nRFP/1e3)) + geom_hex(bins = 64) + scale_y_log10() + stat_smooth(method = "lm") +
scale_fill_gradientn(colours = rev(RColorBrewer::brewer.pal(11, "Spectral"))) + facet_wrap(~name, scales = "free_y") +
theme_bw()
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 17 rows containing non-finite values (stat_binhex).
`geom_smooth()` using formula 'y ~ x'
Warning: Removed 17 rows containing non-finite values (stat_smooth).
# calculate ratio
norm.data <- fsApply(fs.out, function(cf) {
cf <- cbind(cf,
nRFP = cf[,"YL2.H"] / (cf[, "FSC.H"]/mfsc)^(exponent),
nGFP = cf[,"BL1.H"] / (cf[, "FSC.H"]/mfsc)^(exponent))
apply(cf[, c("FSC.H", "BL1.H", "YL2.H", "nGFP", "nRFP")], 2, median)
}, use.exprs = TRUE) %>%
as_tibble(rownames = "name") %>%
mutate(across(BL1.H:nRFP, ~ round(.x, 1)))
The goal is to export the gated events and calculate the RFP/GFP and take the median, which will be used in downstream analyses.
Get the population stats
stats <- gs_pop_get_stats(gs) %>%
as_tibble() %>%
mutate(pop = gsub(".*/", "", pop), pop = gsub("-outlier", "cells", pop)) %>%
pivot_wider(names_from = pop, names_prefix = "n_", values_from = count)
Export the data
# pull all info together in a single tibble
final <- left_join(as_tibble(pData(fs)), stats, by = c("name" = "sample")) %>%
left_join(norm.data, by = "name")
write_tsv(final, "20220512-gated-median-out.txt")